First, generate data:
set.seed(0)
datobj = generate_data_generic(p=5, TT=300, fac=.5, nt=2000, dimdat = 3)
ylist = datobj$ylist
X = datobj$X
This produces three dimensional cytograms ylist and covariates X.
ylist is a list of length \(T=300\), the number of time points (or cytograms). Each element of ylist is an array with \(d=3\) rows (a single cytogram) and \(n_t\) columns. The number of columns \(n_t\) of each element in ylist can be different.
X is a \(T \times d\) matrix, whose \(t\)’th rows contain the relevant (environmental) factors of the \(t\)’th cytogram.
plot(ylist[[1]][,1:2], ylab="", xlab="", pch=16, col=rgb(0,0,1,0.2), cex=.5)
Especially if your data is a time series, it could be useful to plot the covariates once across time.
matplot(X, type = 'l')
Now, we estimate the data with fixed regularization parameters \(\lambda_\alpha=0.01\) and \(\lambda_\beta=0.01\), and \(K=10\) clusters.
Internally, flowmix() repeats the estimation 5 (the default) times, and returns the estimated model providing the best data fit.
numclust = 4
set.seed(0)
res = flowmix(ylist = ylist, X = X, numclust = numclust,
mean_lambda = 0.01, prob_lambda = 0.001,
nrep = 5)
print(res)
## [1] "57 out of 60 regression coefficients for the cluster centers, are zero."
## [1] "18 out of 20 regression coefficients for the cluster probabilities, are zero."
## [1] "The algorithm converged in 16 EM iterations."
The cluster probabilities look like this:
plot_prob(res)
The estimated cluster means along the three dimensions look like this:
par(mfrow = c(1,3), cex = 1.2)
for(idim in 1:3){
res$mn[,idim,] %>% matplot(type = 'l',
lty = 1, ylab = paste0("Mean, dim=", idim))
}
The estimated cluster means, drawn on scatterplots of the data two dimensions at a time, look like this:
par(mfrow = c(1,3))
ylim = c(-3,8)
xlim = c(-5,8)
for(dims in list(c(1,2), c(2,3), c(3,1))){
scatterplot_2d(ylist, res, 100, dims = dims, cex_fac=1, ylim=ylim, xlim=xlim)
}
Next, the first two dimensions only, shown at three different time points:
par(mfrow = c(1,3))
for(tt in c(1,50,200)){
scatterplot_2d(ylist, res, tt,
dims = c(1,2),
cex_fac = 1,
ylim = ylim,
xlim = xlim)
title(main = paste0("t=", tt, " out of ", res$TT))
}
Showing the model across time, in an animation:
par(mfrow = c(1,3), oma = c(2,2,2,2))
ylim = c(-3,8)
xlim = c(-5,8)
for(tt in 1:res$TT){
for(dims in list(c(1,2), c(2,3), c(3,1))){
scatterplot_2d(ylist, res, tt, dims = dims, cex_fac=1, ylim=ylim, xlim=xlim)
}
mtext(outer = TRUE,
text = paste0("t=", tt, " out of ", res$TT),
cex = 2)
}